Edges weighted with the combined score generated by the STRING database will be useful for comparison against our own method and to test the community detection analysis before the weighted edges generated using our method are ready. Two options exist to get these weightings:

  1. Use the STRING online service
  2. Parsing the STRING summary features we have already extracted in this notebook

Unfortunately, the online service produces a table that does not include the Entrez IDs that are originally put in, so the output would have to be mapped back to Entrez IDs for our pipeline. The fastest way will be to use the pickled object created in the above notebook to generate features and take only the combined values:


In [1]:
cd ../../features


/home/gavin/Documents/MRes/features

In [2]:
import csv

In [3]:
ls


abundance.Entrez.full.txt@       head.training.nolabel.negative.Entrez.vectors.txt@
abundance.Entrez.traintest.txt@  pulldown.edges.Entrez.txt@
autogit.log                      pulldown.nolabel.Entrez.vectors.txt@
c2s.Entrez.full.txt@             training.nolabel.negative.Entrez.vectors.txt
c2s.Entrez.traintest.txt@        training.nolabel.positive.Entrez.vectors.txt

In [4]:
import sys

In [5]:
sys.path.append("/home/gavin/Documents/MRes/opencast-bio/")

In [6]:
import ocbio.string

In [7]:
import pickle

In [8]:
f = open("../string/human.Entrez.string.pickle")
stringfeatures = pickle.load(f)
f.close()

In [32]:
pulldownpairfile = open("../forGAVIN/pulldown_data/pulldown.interactions.Entrez.tsv")
stringedgefile = open("pulldown.string.edges.tsv", "w")
cp = csv.reader(pulldownpairfile, delimiter="\t")
cs = csv.writer(stringedgefile, delimiter="\t")
for l in cp:
    # for each pair index the feature dictionary
    # write the pairs that are non-zero
    pair = frozenset(l)
    combinedscore = float(stringfeatures[pair][-1])
    if combinedscore > 0.0000001:
        cs.writerow(l + [combinedscore])
pulldownpairfile.close()
stringedgefile.close()

In [33]:
!head pulldown.string.edges.tsv